specpipe 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (60) hide show
  1. package/README.md +1319 -0
  2. package/bin/devkit.js +3 -0
  3. package/package.json +61 -0
  4. package/src/cli.js +76 -0
  5. package/src/commands/check.js +33 -0
  6. package/src/commands/diff.js +84 -0
  7. package/src/commands/init-adopt.js +54 -0
  8. package/src/commands/init-agents.js +118 -0
  9. package/src/commands/init-global.js +102 -0
  10. package/src/commands/init.js +311 -0
  11. package/src/commands/list.js +54 -0
  12. package/src/commands/remove.js +133 -0
  13. package/src/commands/upgrade.js +215 -0
  14. package/src/lib/agent-guards.js +100 -0
  15. package/src/lib/agent-install.js +161 -0
  16. package/src/lib/agents.js +280 -0
  17. package/src/lib/claude-global.js +183 -0
  18. package/src/lib/detector.js +93 -0
  19. package/src/lib/hasher.js +21 -0
  20. package/src/lib/installer.js +213 -0
  21. package/src/lib/logger.js +16 -0
  22. package/src/lib/manifest.js +102 -0
  23. package/src/lib/reconcile.js +56 -0
  24. package/templates/.claude/CLAUDE.md +79 -0
  25. package/templates/.claude/hooks/comment-guard.js +126 -0
  26. package/templates/.claude/hooks/file-guard.js +216 -0
  27. package/templates/.claude/hooks/glob-guard.js +104 -0
  28. package/templates/.claude/hooks/path-guard.sh +118 -0
  29. package/templates/.claude/hooks/self-review.sh +27 -0
  30. package/templates/.claude/hooks/sensitive-guard.sh +227 -0
  31. package/templates/.claude/settings.json +68 -0
  32. package/templates/docs/WORKFLOW.md +325 -0
  33. package/templates/docs/specs/.gitkeep +0 -0
  34. package/templates/hooks/specpipe-read-guard.sh +42 -0
  35. package/templates/hooks/specpipe-shell-guard.sh +65 -0
  36. package/templates/rules/specpipe-guards.md +40 -0
  37. package/templates/scripts/test-hooks.sh +66 -0
  38. package/templates/skills/sp-build/SKILL.md +776 -0
  39. package/templates/skills/sp-challenge/SKILL.md +255 -0
  40. package/templates/skills/sp-commit/SKILL.md +174 -0
  41. package/templates/skills/sp-explore/SKILL.md +730 -0
  42. package/templates/skills/sp-fix/SKILL.md +266 -0
  43. package/templates/skills/sp-humanize/SKILL.md +212 -0
  44. package/templates/skills/sp-investigate/SKILL.md +648 -0
  45. package/templates/skills/sp-md-render/SKILL.md +200 -0
  46. package/templates/skills/sp-md-render/components.md +415 -0
  47. package/templates/skills/sp-md-render/template.html +283 -0
  48. package/templates/skills/sp-plan/SKILL.md +947 -0
  49. package/templates/skills/sp-review/SKILL.md +268 -0
  50. package/templates/skills/sp-scaffold/SKILL.md +237 -0
  51. package/templates/skills/sp-scaffold/references/ARCHITECTURE.md.tmpl +228 -0
  52. package/templates/skills/sp-scaffold/references/DESIGN.md.tmpl +113 -0
  53. package/templates/skills/sp-scaffold/references/adr/NNNN-template.md +92 -0
  54. package/templates/skills/sp-scaffold/references/stack-profiles/react.md +36 -0
  55. package/templates/skills/sp-spec-render/SKILL.md +254 -0
  56. package/templates/skills/sp-spec-render/components.md +418 -0
  57. package/templates/skills/sp-spec-render/examples/user-auth.html +749 -0
  58. package/templates/skills/sp-spec-render/examples/user-auth.md +114 -0
  59. package/templates/skills/sp-spec-render/template.html +222 -0
  60. package/templates/skills/sp-voices/SKILL.md +1184 -0
package/README.md ADDED
@@ -0,0 +1,1319 @@
1
+ <p align="center">
2
+ <img src="docs/cover.svg" alt="Specpipe — spec-first multi-agent dev toolkit" width="100%">
3
+ </p>
4
+
5
+ <h1 align="center">Specpipe</h1>
6
+
7
+ <p align="center">
8
+ A lightweight, spec-first development toolkit for agentic AI coding agents.
9
+ </p>
10
+
11
+ It enforces the cycle **spec (with acceptance scenarios) → code + tests → build pass** through skills, always-on guardrails, and a universal test runner.
12
+
13
+ **Agents:** [Claude Code](https://claude.ai/code) (full hook enforcement) plus Codex, Cursor, Antigravity, OpenClaw, and Hermes (skills + advisory guard rules). Install for one or all: `specpipe init --agents <list>|all`. See [docs/multi-agent.md](docs/multi-agent.md).
14
+ **Works with:** Swift, TypeScript/JavaScript, Python, Rust, Go, Java/Kotlin, C#, Ruby.
15
+ **Dependencies:** None (requires only a supported agent CLI, Node.js, Git, and Bash).
16
+ **Optional:** [GraphAtlas](https://github.com/microvn/graphatlas) MCP server for graph-based code intelligence — six skills use it automatically when present and fall back to `grep` when it isn't. See [§3 Setup](#3-setup).
17
+
18
+ ---
19
+
20
+ ## Table of Contents
21
+
22
+ 1. [Philosophy](#1-philosophy)
23
+ 2. [Quick Start](#2-quick-start)
24
+ 3. [Setup](#3-setup)
25
+ 4. [Daily Workflows](#4-daily-workflows)
26
+ 5. [Commands Reference](#5-commands-reference)
27
+ 6. [Automatic Guards (Hooks)](#6-automatic-guards-hooks)
28
+ 7. [Spec Format](#7-spec-format)
29
+ 8. [Customization](#8-customization)
30
+ 9. [Token Cost Guide](#9-token-cost-guide)
31
+ 10. [Troubleshooting](#10-troubleshooting)
32
+ 11. [FAQ](#11-faq)
33
+
34
+ ---
35
+
36
+ ## 1. Philosophy
37
+
38
+ ### The Core Cycle
39
+
40
+ ```
41
+ SPEC (with acceptance scenarios) → CODE + TESTS → BUILD PASS
42
+ ```
43
+
44
+ Every code change — feature, fix, or removal — follows this cycle. The spec is the source of truth. Acceptance scenarios (Given/When/Then) are embedded directly in the spec — no separate test plan file. If code contradicts the spec, the code is wrong.
45
+
46
+ ### Why Spec-First?
47
+
48
+ - **Prevents drift.** Acceptance scenarios live inside the spec — no separate test plan to fall out of sync.
49
+ - **Tests have purpose.** Scenarios derived from specs test behavior, not implementation details. This means tests survive refactoring.
50
+ - **AI writes better code.** When an agent has a spec with concrete Given/When/Then scenarios, it generates more accurate implementations and more meaningful tests.
51
+ - **Reviews are grounded.** Reviewers can check code against the spec rather than guessing at intent.
52
+
53
+ ### Principles
54
+
55
+ 1. **Specs are source of truth** — Code changes require spec updates first.
56
+ 2. **Incremental, not big-bang** — Test after each code chunk, not after everything is done.
57
+ 3. **Tests travel with code** — Every PR includes production code + tests + spec updates.
58
+ 4. **Build pass is the gate** — Nothing merges with failing tests.
59
+ 5. **Everything in the repo** — Specs, plans, tests, and code are version-controlled and reviewable.
60
+
61
+ ---
62
+
63
+ ## 2. Quick Start
64
+
65
+ **Time needed: 5 minutes.** Below is a realistic transcript — user input, what each skill actually asks, what it actually outputs. Nothing embellished.
66
+
67
+ ```bash
68
+ npx specpipe init . # install for Claude Code (default)
69
+ # or target other agents:
70
+ npx specpipe init . --agents cursor,codex
71
+ npx specpipe init . --agents all
72
+
73
+ claude # open your agent (claude, codex, cursor, …)
74
+ ```
75
+
76
+ > The transcript below is on Claude Code. The same skills install for Codex, Cursor,
77
+ > Antigravity, OpenClaw, and Hermes — see [§ Supported agents](#supported-agents).
78
+
79
+ ### Step 1 — Spec the feature (`/sp-plan`)
80
+
81
+ ```text
82
+ You: /sp-plan "add tag filter to the todo list screen"
83
+
84
+ Claude → running /sp-plan
85
+ Phase 0: Codebase scan — found docs/specs/todo-list/todo-list.md (exists)
86
+ Phase 1: Scope Challenge — TodoList already supports text search; tag
87
+ filter can reuse the same query pipeline. Flagging: reuse vs rebuild.
88
+
89
+ AskUserQuestion: "Extend existing search, or add a separate tag filter UI?"
90
+ A) Extend search — one input, parsed (text + "tag:work") [lowest cost]
91
+ B) Separate tag chips — chips below the search box [clearer UX]
92
+ C) Both — chips drive the same query [most work]
93
+
94
+ You: B
95
+
96
+ Claude: Drafting spec…
97
+ docs/specs/todo-list/todo-list.md (Mode C update, snapshot created)
98
+
99
+ ## Stories (added)
100
+ ### S-007: Filter todos by tag (P0)
101
+ AS-014: Given todos tagged [work, home], when user taps "work" chip,
102
+ then only work-tagged todos show.
103
+ AS-015: Given no todos match the selected tags, when filter applied,
104
+ then empty state shows "No todos match these tags".
105
+
106
+ What Already Exists: TodoRepository.query() handles predicates — reuse.
107
+ Not in Scope: saving favorite tag combinations (deferred).
108
+ ```
109
+
110
+ > **Complex spec?** Run `/sp-challenge docs/specs/<feature>/<feature>.md` before coding.
111
+ > It spawns parallel hostile reviewers (Security Adversary, Failure Mode Analyst,
112
+ > Assumption Destroyer, Scope & YAGNI Critic) that quote the spec and propose surgical
113
+ > edits — e.g. "AS-014 doesn't define tag case-sensitivity → add constraint or AS".
114
+ > Skip for simple CRUD like this tag filter.
115
+
116
+ ### Step 2 — Implement + build (`/sp-build`)
117
+
118
+ ```text
119
+ You: (edit TodoListView.swift, TagFilterChip.swift) — now /sp-build
120
+
121
+ Claude → running /sp-build
122
+ Phase 1.5: Coverage Map
123
+ [+] TodoListView.swift
124
+ └── applyTagFilter(selected:)
125
+ ├── [★★ TESTED] empty selection → returns all (new test)
126
+ ├── [★★ TESTED] single tag match (new test)
127
+ ├── [GAP] no-match empty state (← AS-015)
128
+ └── [★★ TESTED] multi-tag AND semantics (new test)
129
+
130
+ Writing test for [GAP]… building… running.
131
+ ✓ 14 tests passed (3 new, 0 regressions)
132
+ ```
133
+
134
+ ### Step 3 — Bug found? (`/sp-fix`)
135
+
136
+ ```text
137
+ You: /sp-fix "selecting two tags shows OR results, should be AND"
138
+
139
+ Claude → running /sp-fix
140
+ Phase 0: Bug Path Diagram
141
+ applyTagFilter(selected: ["work","urgent"])
142
+ └── [BUG] predicate uses .contains(anyOf:) — should be .contains(allOf:)
143
+
144
+ Phase 1: writing failing test (AS-014 variant: 2 tags, AND)… FAILS ✓
145
+ Phase 2: changing .anyOf → .allOf in TodoRepository.swift:88 (1 line)
146
+ Phase 3: test now PASSES, full suite green.
147
+
148
+ DEBUG REPORT
149
+ Root cause: wrong set operator, regression from S-005 search refactor
150
+ Prevention: add AS for multi-tag AND to spec (auto-signaled)
151
+ ```
152
+
153
+ ### Step 4 — Review + commit (`/sp-review` → `/sp-commit`)
154
+
155
+ ```text
156
+ You: /sp-review
157
+
158
+ Claude → Smart Focus: UI state + query predicate
159
+ Failure Mode Grid: all paths have tests ✓ error states covered ✓
160
+ Finding (confidence 8/10): selected-tags state not persisted across
161
+ navigation. Not in current spec — logged under "Not in Scope".
162
+ Verdict: APPROVE
163
+
164
+ You: /sp-commit
165
+
166
+ Claude → secret scan: clean. debug-code scan: clean.
167
+ Staging 4 files. Conventional message:
168
+ feat(todo): filter list by selected tags with AND semantics
169
+ ✓ commit a1b2c3d (not pushed — run `git push` when ready)
170
+ ```
171
+
172
+ > **Complex bug?** Insert `/sp-investigate "<bug>"` before `/sp-fix`. It's read-only,
173
+ > writes `docs/investigate/<slug>-<date>.md` with hypotheses + blast radius, then
174
+ > `/sp-fix` auto-picks it up. Skip for trivial bugs.
175
+
176
+ That's the 5 minutes. The CLI auto-detected your project (Swift + XCTest here) — no config touched.
177
+
178
+ ---
179
+
180
+ ## 3. Setup
181
+
182
+ ### Prerequisites
183
+
184
+ | Tool | Required | Why |
185
+ |------|----------|-----|
186
+ | **A supported agent CLI** | Yes | Runs the skills — Claude Code, Codex, Cursor, Antigravity, OpenClaw, or Hermes |
187
+ | **Git** | Yes | Change detection, commit workflow |
188
+ | **Node.js** (18+) | Yes | File guard hook, JSON parsing |
189
+ | **Bash** (4+) | Yes | Path guard hook, shell-based hooks |
190
+ | **Language toolchain** | Yes | Whatever your project uses (Swift, npm, pytest, etc.) |
191
+ | **[GraphAtlas](https://github.com/microvn/graphatlas)** | Optional | Graph-based code intelligence — skills prefer it over `grep` when connected (see below) |
192
+
193
+ ### Installation
194
+
195
+ **Option A: One-command install** (recommended)
196
+
197
+ ```bash
198
+ npx specpipe init .
199
+ ```
200
+
201
+ **Option B: Global install**
202
+
203
+ ```bash
204
+ npm install -g specpipe
205
+
206
+ # Then, in any project:
207
+ cd my-project
208
+ specpipe init .
209
+ ```
210
+
211
+ **Option C: Global skills install** (available in all projects without running `init` again)
212
+
213
+ ```bash
214
+ specpipe init --global
215
+ # or after per-project init, answer "yes" to the global prompt
216
+ ```
217
+
218
+ Skills installed globally at `~/.claude/skills/` are available in every project. Per-project `.claude/skills/` always takes precedence over global — so projects can still override individual skills.
219
+
220
+ **Option D: Force re-install** (overwrites existing files)
221
+
222
+ ```bash
223
+ npx specpipe init --force .
224
+ ```
225
+
226
+ **Option D: Selective install** (only specific components)
227
+
228
+ ```bash
229
+ npx specpipe init --only hooks,skills .
230
+ ```
231
+
232
+ **Option E: Multi-agent install** (one agent, several, or all)
233
+
234
+ ```bash
235
+ npx specpipe init --agents cursor . # one
236
+ npx specpipe init --agents claude,codex . # several
237
+ npx specpipe init --agents all . # every supported agent
238
+ ```
239
+
240
+ ### Supported agents
241
+
242
+ The skills are authored once and emitted into each agent's native format on install.
243
+ The markdown body is identical across agents; only the file location, name, and
244
+ frontmatter change. Guardrails are **enforced via blocking hooks** for Claude, Codex,
245
+ and Cursor (they can deny a tool call); Antigravity, OpenClaw, and Hermes get the same
246
+ guard intent as **always-on advisory rules**.
247
+
248
+ | Agent | Install location | Guardrails |
249
+ |-------|------------------|-----------|
250
+ | **Claude Code** | `.claude/skills/sp-*/SKILL.md` + `.claude/hooks/` | Hook-enforced |
251
+ | **Codex CLI** | `.agents/skills/sp-*/SKILL.md` | **enforced** `.codex/hooks.json` + `AGENTS.md` |
252
+ | **Cursor** | `.cursor/skills/sp-*/SKILL.md` | **enforced** `.cursor/hooks.json` + `.cursor/rules/` |
253
+ | **Antigravity** | `.agents/skills/sp-*/SKILL.md` | `.agent/rules/` (advisory) |
254
+ | **OpenClaw** | `skills/sp-*/SKILL.md` | `SPECPIPE-GUARDS.md` (advisory) |
255
+ | **Hermes** | `optional-skills/specpipe/sp-*/SKILL.md` | `SPECPIPE-GUARDS.md` (advisory) |
256
+
257
+ Skills that use Claude-only tools (`AskUserQuestion`, subagents) get a "Running outside
258
+ Claude Code" note appended for the other agents, so they degrade gracefully. The specs
259
+ and workflow themselves are tool-agnostic. Full details: [docs/multi-agent.md](docs/multi-agent.md).
260
+
261
+ ### What Gets Installed
262
+
263
+ The tree below is the **Claude Code** layout (`--agents claude`, the default). Other
264
+ agents install the same skills into their own locations — see [Supported agents](#supported-agents).
265
+
266
+ ```
267
+ your-project/
268
+ ├── .specpipe/
269
+ │ └── manifest.json ← install manifest (tracks files per agent; used by upgrade/remove)
270
+ ├── .claude/
271
+ │ ├── CLAUDE.md ← Project rules hub
272
+ │ ├── settings.json ← Hook wiring
273
+ │ ├── hooks/
274
+ │ │ ├── file-guard.js ← Warns on large files
275
+ │ │ ├── path-guard.sh ← Blocks wasteful Bash paths
276
+ │ │ ├── glob-guard.js ← Blocks broad glob patterns
277
+ │ │ ├── comment-guard.js ← Blocks placeholder comments
278
+ │ │ ├── sensitive-guard.sh ← Blocks access to secrets
279
+ │ │ └── self-review.sh ← Quality checklist on stop
280
+ │ └── skills/
281
+ │ ├── sp-explore/SKILL.md ← /sp-explore skill
282
+ │ ├── sp-scaffold/ ← /sp-scaffold skill (greenfield bootstrap)
283
+ │ │ ├── SKILL.md
284
+ │ │ └── references/ ← ARCHITECTURE/DESIGN templates, ADR template,
285
+ │ │ │ stack-profiles/ seeds (copy to ~/.claude or
286
+ │ │ │ ./.claude to customize — bundled copy is overwritten on upgrade)
287
+ │ │ ├── ARCHITECTURE.md.tmpl
288
+ │ │ ├── DESIGN.md.tmpl
289
+ │ │ ├── adr/NNNN-template.md
290
+ │ │ └── stack-profiles/react.md
291
+ │ ├── sp-plan/SKILL.md ← /sp-plan skill
292
+ │ ├── sp-challenge/SKILL.md ← /sp-challenge skill
293
+ │ ├── sp-build/SKILL.md ← /sp-build skill
294
+ │ ├── sp-investigate/SKILL.md ← /sp-investigate skill (optional, read-only)
295
+ │ ├── sp-fix/SKILL.md ← /sp-fix skill
296
+ │ ├── sp-review/SKILL.md ← /sp-review skill
297
+ │ ├── sp-commit/SKILL.md ← /sp-commit skill
298
+ │ ├── sp-spec-render/ ← /sp-spec-render skill (spec HTML view, user-invoked)
299
+ │ │ ├── SKILL.md
300
+ │ │ ├── template.html
301
+ │ │ ├── components.md
302
+ │ │ └── examples/
303
+ │ ├── sp-md-render/ ← /sp-md-render skill (generic markdown HTML view)
304
+ │ │ ├── SKILL.md
305
+ │ │ ├── template.html
306
+ │ │ └── components.md
307
+ │ ├── sp-voices/SKILL.md ← /sp-voices skill (multi-LLM review)
308
+ │ └── sp-humanize/SKILL.md ← /sp-humanize skill (rephrase to human voice)
309
+ └── docs/
310
+ ├── specs/ ← Your specs (folder-per-feature)
311
+ │ └── <feature>/
312
+ │ ├── <feature>.md ← Spec with acceptance scenarios
313
+ │ └── snapshots/ ← Version history (managed by /sp-plan)
314
+ └── WORKFLOW.md ← Process reference
315
+ ```
316
+
317
+ ### Optional: GraphAtlas Code Intelligence
318
+
319
+ The `sp-*` skills work out of the box with `grep`. But when [GraphAtlas](https://github.com/microvn/graphatlas) (GA) is connected as an MCP server, six skills — `/sp-explore`, `/sp-plan`, `/sp-build`, `/sp-fix`, `/sp-review`, `/sp-investigate` — prefer it over `grep` for code discovery, call-graph tracing, and blast-radius analysis.
320
+
321
+ **Why it helps:** `grep` can't tell a call site from a string literal, doesn't see polymorphic dispatch, and won't follow re-exports. An agent that edits one function but misses its callers, test files, and overrides in other modules ships a bug. GA indexes the repo once into a local graph with typed `CALL` / `IMPORT` / `OVERRIDE` edges, then answers structural questions deterministically in milliseconds with a small token footprint. It runs 100% locally — no LLM, no embeddings, no telemetry.
322
+
323
+ **How the skills use it:** each skill runs a one-time probe (`ga_architecture`) at the start. If GA responds, it leans on tools like `ga_impact` (blast radius + affected tests), `ga_callers` / `ga_callees` (call graph), `ga_symbols` (definition lookup), and `ga_rename_safety`. If GA is absent — or the index is stale — the skill falls back to `grep`/`glob` automatically. Nothing breaks; you only lose the precision.
324
+
325
+ **Setup:** GA is a separate tool, not bundled with this kit. Install and register it as an MCP server following the instructions at [github.com/microvn/graphatlas](https://github.com/microvn/graphatlas). Once registered, the skills detect it on their own — no changes to this kit's config needed.
326
+
327
+ ### Post-Install Configuration
328
+
329
+ The CLI auto-detects your project type and fills in `CLAUDE.md`. Verify it's correct:
330
+
331
+ ```bash
332
+ cat .claude/CLAUDE.md
333
+ ```
334
+
335
+ Look for the **Project Info** section. Ensure language, test framework, and directories are correct. Edit manually if needed.
336
+
337
+ ### Upgrade
338
+
339
+ ```bash
340
+ npx specpipe upgrade
341
+ ```
342
+
343
+ Smart upgrade — updates kit files but preserves any you've customized. Use `--force` to overwrite everything.
344
+
345
+ ```bash
346
+ # Check if update is available
347
+ npx specpipe check
348
+
349
+ # See what changed
350
+ npx specpipe diff
351
+
352
+ # View installed files and status
353
+ npx specpipe list
354
+ ```
355
+
356
+ ### Uninstall
357
+
358
+ ```bash
359
+ npx specpipe remove
360
+ ```
361
+
362
+ This removes hooks, skills, and settings. It preserves `CLAUDE.md` (which you may have customized) and `docs/` (which contains your specs).
363
+
364
+ ---
365
+
366
+ ## 4. Daily Workflows
367
+
368
+ ### New Project (Greenfield)
369
+
370
+ > When: Brand-new project — no codebase yet (empty repo, no package manager / `src/`).
371
+
372
+ ```
373
+ 1. /sp-explore "what you're building"
374
+ → Detects greenfield, also decides app-type + stack (researched, current),
375
+ emits a Bootstrap Brief in docs/explore/<feature>.md.
376
+
377
+ 2. /sp-scaffold
378
+ → Generator-first runnable skeleton (core/ + one pattern-demonstrating module +
379
+ tests), smoke-gated (install→build→start GREEN), + ARCHITECTURE.md / ADRs.
380
+ Hands off only when it RUNS.
381
+
382
+ 3. /sp-plan → /sp-build → normal New Feature flow, now on a runnable base.
383
+ ```
384
+
385
+ ### Explore Before Planning
386
+
387
+ > When: Requirements are unclear, you're debating between approaches, or it's a brownfield feature with existing code to understand first.
388
+
389
+ ```
390
+ 1. /sp-explore "feature description"
391
+ → Asks questions as a Client Technical Lead — one topic at a time.
392
+ → Clarifies: why, behavior, boundaries, business rules, edge cases, permissions, UI.
393
+ → Output: docs/explore/<feature>.md
394
+
395
+ 2. /sp-plan "feature description"
396
+ → Auto-detects docs/explore/<feature>.md, skips redundant discovery.
397
+ → Continue with the normal New Feature flow.
398
+ ```
399
+
400
+ **Example:**
401
+ ```
402
+ /sp-explore "cancel order request"
403
+ ```
404
+
405
+ ### New Feature
406
+
407
+ > When: Building something new — no existing code or spec.
408
+
409
+ ```
410
+ 1. /sp-plan "description of the feature"
411
+ → Generates spec with acceptance scenarios at docs/specs/<feature>/<feature>.md.
412
+
413
+ 2. Implement code in chunks.
414
+ After each chunk: /sp-build
415
+ Repeat until green.
416
+
417
+ 3. /sp-review (before merge)
418
+
419
+ 4. /sp-commit
420
+ ```
421
+
422
+ **Example:**
423
+ ```
424
+ /sp-plan "User authentication with email/password login, password reset via email, and session management with 24h expiry"
425
+ ```
426
+
427
+ ### Update Existing Feature
428
+
429
+ > When: Changing behavior of something that already exists.
430
+
431
+ ```
432
+ 1. /sp-plan docs/specs/<feature>/<feature>.md "description of changes"
433
+ → Mode C handles everything: snapshot → classification → change report → apply.
434
+ Do NOT manually edit the spec before running /sp-plan.
435
+
436
+ 2. Implement the code change.
437
+ /sp-build
438
+ Fix until green.
439
+
440
+ 3. /sp-review → /sp-commit
441
+ ```
442
+
443
+ ### Bug Fix
444
+
445
+ > When: Something is broken.
446
+
447
+ ```
448
+ 0. (OPTIONAL) /sp-investigate "description of the bug"
449
+ → Use for complex bugs, outages, data corruption, or when the cause is unclear.
450
+ → Read-only: hypothesis + blast radius + evidence, no code changes.
451
+ → Writes docs/investigate/<slug>-<date>.md for /sp-fix to consume.
452
+ → Skip for trivial/obvious bugs — go straight to /sp-fix.
453
+
454
+ 1. /sp-fix "description of the bug" (or /sp-fix docs/investigate/<slug>-<date>.md)
455
+ → Writes failing test → fixes code → runs full suite.
456
+
457
+ 2. /sp-commit
458
+ ```
459
+
460
+ **Example:**
461
+ ```
462
+ /sp-fix "Search returns no results when query contains apostrophes like O'Brien"
463
+ ```
464
+
465
+ ### Remove Feature
466
+
467
+ > When: Deleting code, removing deprecated functionality.
468
+
469
+ ```
470
+ 1. /sp-plan docs/specs/<feature>/<feature>.md "remove stories S-XXX"
471
+ → Mode C creates a snapshot (removing stories = Major), then marks as removed.
472
+
473
+ 2. Delete production code + related tests.
474
+
475
+ 3. Run the full test suite (your project's native test command).
476
+ Fix cascading breaks.
477
+
478
+ 4. /sp-commit
479
+ ```
480
+
481
+ ---
482
+
483
+ ## 5. Commands Reference
484
+
485
+ ### /sp-explore — Feature Discovery as Client Technical Lead
486
+
487
+ **Usage:**
488
+ ```
489
+ /sp-explore "cancel order request"
490
+ /sp-explore "user notification preferences"
491
+ ```
492
+
493
+ **When to use:** Requirements are unclear, you're debating between approaches, or you want to clarify a feature deeply before committing to a spec. Runs before `/sp-plan`.
494
+
495
+ **How it works:**
496
+
497
+ 1. **Phase 0: Codebase scan** — Silently checks for existing code, related specs, and existing explore docs before asking anything.
498
+ 2. **Phase 1: Why, not what** — Asks what problem requires this feature, who faces it, and how they handle it today. Prevents building the wrong thing.
499
+ 3. **Phase 2: Desired behavior** — Walks through the flow step by step, identifies trigger and final result, checks for multi-role approval chains.
500
+ 4. **Phase 2.5: UI/UX expectation** — Clarifies interface type (table, form, wizard, dashboard). Offers sensible defaults when the client is unsure. Suggests simpler approaches when expectations are complex.
501
+ 5. **Phase 3: Boundaries** — Impact on existing screens, data changes, migration needs, out of scope, permissions.
502
+ 6. **Phase 3.5: Scope optimization** — Identifies what can ship fast vs what can defer to phase 2.
503
+ 7. **Phase 4: Business rules & validation** — Conditions, formulas (with real numbers), input validation, notifications, time constraints, concurrency.
504
+ 8. **Phase 5: Edge cases** — Empty states, error messages, double submit, network loss, limits, sensitive data, domain-specific cases (payment double-charge, booking overbooking, etc.).
505
+ 9. **Phase 6: Scenario confirmation** — Presents concrete happy path + unhappy paths with fake data. Confirms with user before proceeding.
506
+ 10. **Phase 7: Handoff summary** — Compiles everything into a structured doc, confirms with user, writes to `docs/explore/<feature>.md`.
507
+
508
+ **Output:** `docs/explore/<feature>.md` — auto-detected by `/sp-plan`, which skips redundant discovery and maps explore findings directly to spec sections.
509
+
510
+ **Token cost:** 10–20k
511
+
512
+ ---
513
+
514
+ ### /sp-scaffold — Greenfield Project Bootstrap
515
+
516
+ **Usage:**
517
+ ```
518
+ /sp-scaffold # bootstrap from the Bootstrap Brief in docs/explore/
519
+ /sp-scaffold "Next.js + Nest pnpm monorepo" # standalone: gather app-type/stack itself
520
+ ```
521
+
522
+ **When to use:** A brand-new project with no runnable codebase yet. Runs between `/sp-explore` (greenfield branch) and `/sp-plan`: `sp-explore → sp-scaffold → sp-plan → sp-build`. Skip if a runnable project already exists — go straight to `/sp-plan`. `/sp-build`'s Foundation Gate refuses to start the TDD loop until this has produced a runnable harness.
523
+
524
+ **How it works:**
525
+
526
+ 1. **Precondition** — confirms greenfield; resumes a partial repo without clobbering user files.
527
+ 2. **App-type + stack** — taken from the Bootstrap Brief (or asked); never silently defaulted; **current versions researched**, not recalled from training memory. Optional layered stack profiles (`./.claude/` > `~/.claude/` > kit seed) supply opinionated defaults; the Brief always wins.
528
+ 3. **Skeleton (generator-first)** — official `create-*` CLIs give real pinned deps (defends against hallucinated/typosquatted packages); monorepos orchestrated root-first; imposes `core/` + `modules/` + co-located tests; seeds ONE module that **demonstrates the architecture pattern** (the template every feature copies).
529
+ 4. **Smoke gate (non-negotiable)** — `install → build → start/smoke` must be GREEN, with ≥1 real passing test (this resolves `TEST_CMD` for `/sp-build`). Not green → BLOCKED; never a half-scaffold.
530
+ 5. **Docs** — fills `ARCHITECTURE.md` (codemap + invariants), one ADR per major stack choice, optional `DESIGN.md`.
531
+ 6. **Hygiene & handoff** — secret scan, `.gitignore`, `.env.example`; reports the resolved `TEST_CMD`.
532
+
533
+ **Output:** a runnable walking skeleton + canonical docs. Thin by design — features come later via `/sp-plan` → `/sp-build`.
534
+
535
+ **Token cost:** 15–40k + real install/build time (heavier than other skills — it runs generators and builds).
536
+
537
+ ---
538
+
539
+ ### /sp-plan — Generate Spec with Acceptance Scenarios
540
+
541
+ **Usage:**
542
+ ```
543
+ /sp-plan "user authentication with OAuth2" # Mode A: new spec from description
544
+ /sp-plan docs/specs/auth/auth.md # Mode B: add scenarios to existing spec
545
+ /sp-plan docs/specs/auth/auth.md "add password reset flow" # Mode C: update existing spec
546
+ ```
547
+
548
+ **Modes:**
549
+ - **Mode A** — Creates a new spec with stories and acceptance scenarios from your description.
550
+ - **Mode B** — Reads an existing spec that has no acceptance scenarios yet, adds them.
551
+ - **Mode C** — Updates an existing spec: creates a snapshot before Major changes, shows a change report, waits for confirmation, then applies.
552
+
553
+ **How it works:**
554
+
555
+ 1. **Phase 0: Codebase Awareness** — Scans existing code, `docs/specs/`, and project patterns before planning. Prevents specs that conflict with existing implementations.
556
+ 2. **Phase 1: Scope & Split + Scope Challenge** — Evaluates feature size (>7 stories or >20 AS → must split). When a feature is large, applies **Sizing & Phasing**: Phase 1 (minimum viable — smallest slice with value), Phase 2 (core experience — happy path), Phase 3 (edge cases, polish), Phase 4 (optimization, monitoring) — each phase mergeable independently. Also runs a **Scope Challenge** before drafting: checks for existing code that already solves sub-problems (reuse vs rebuild), flags complexity smells (8+ files or 2+ new classes/services), searches for framework built-ins, checks for distribution needs (new artifact → CI/CD in scope?), and applies the Completeness Principle (complete version costs only `CC: ≤15m` more → recommend it directly).
557
+ 3. **Phase 2: Draft Spec** — Generates a structured spec with stories and acceptance scenarios (Given/When/Then). Depth scales by priority: P0 gets full GWT + test data, P1 gets GWT, P2 gets 1-2 line descriptions. Runs consistency checks (CC1-CC6) before showing draft.
558
+ 4. **Phase 3: Clarify Ambiguities** — Systematically finds gaps across behavioral, data, auth, non-functional, integration, and concurrency dimensions. Questions include `(human: ~X / CC: ~Y)` effort scales and `Completeness: X/10` scores for each option.
559
+ 5. **Phase 4: Summary** — Shows story counts, AS counts, implementation order, next steps. Every spec also gets a **"What Already Exists"** section (existing code that partially solves the problem) and a **"Not in Scope"** section (deferred work with rationale — prevents work from silently dropping).
560
+
561
+ **Mode C (Update) adds:**
562
+ - **Classification** — Walks through M1-M6 checklist to determine Major vs Minor change.
563
+ - **Snapshot** — Major changes trigger an automatic snapshot (`cp`, bit-perfect) before editing.
564
+ - **Change report** — Shows what will change, waits for user confirmation.
565
+ - **Consistency check** — Runs CC1-CC6 after every update.
566
+
567
+ **Traceability IDs:**
568
+ - `S-NNN` — Stories (with priority P0/P1/P2)
569
+ - `AS-NNN` — Acceptance Scenarios (Given/When/Then, embedded in stories)
570
+ - `FR-NNN` — Functional Requirements (if needed)
571
+ - `SC-NNN` — Success Criteria (if needed)
572
+ - IDs are immutable — deleted IDs are never reused.
573
+
574
+ **Directory structure:**
575
+ ```
576
+ docs/specs/<feature>/
577
+ <feature>.md # single source of truth — always read this file
578
+ snapshots/ # version history (managed by sp-plan, not developers)
579
+ YYYY-MM-DD.md
580
+ YYYY-MM-DD-<REF>.md
581
+ ```
582
+
583
+ **Output:**
584
+ - Spec with acceptance scenarios: `docs/specs/<feature>/<feature>.md`
585
+ - (Optional) Scannable HTML view: `docs/specs/<feature>/<feature>.html` — generated by running `/sp-spec-render <feature>` after `/sp-plan`. `/sp-plan` suggests the command at the end of Phase 4 and Mode C but does not invoke it. Source `.md` remains canonical; HTML is regenerable.
586
+
587
+ ### /sp-spec-render — Render Spec as HTML View
588
+
589
+ **Usage:**
590
+ ```
591
+ /sp-spec-render <feature> # render by feature slug
592
+ /sp-spec-render docs/specs/auth/auth.md # render specific spec
593
+ /sp-spec-render docs/specs/billing/ # render spec dir
594
+ /sp-spec-render --all # bulk re-render all specs
595
+ /sp-spec-render # list + prompt
596
+ ```
597
+
598
+ **When to use:** Decoupled from `/sp-plan` — you invoke it explicitly when you want the HTML view. `/sp-plan` writes the spec markdown and ends; it suggests `/sp-spec-render` at the end of Phase 4 and Mode C but never calls it automatically. Run it:
599
+ - After `/sp-plan` to generate the initial HTML view (sidebar TOC, story cards, collapsible AS)
600
+ - After a Mode C update to refresh a now-stale `.html`
601
+ - After fixing a typo directly in `<feature>.md` (no spec semantics changed, but HTML is stale)
602
+ - For specs written before this skill existed
603
+ - Bulk (`--all`) after changing `template.html` or `components.md`
604
+
605
+ **How it works:**
606
+
607
+ 1. Reads `docs/specs/<feature>/<feature>.md` (+ sub-specs if multi-spec).
608
+ 2. Reads `template.html` + `components.md` (cached, not regenerated each call).
609
+ 3. Parses spec: frontmatter, stories with priority badges, acceptance scenarios (Given/When/Then), constraints, change log, snapshots.
610
+ 4. Builds the HTML buffer in-memory using component snippets — copy verbatim, fill content. AI never writes CSS or component markup from scratch.
611
+ 5. Writes `<feature>.html` next to `<feature>.md` in one Write call.
612
+
613
+ **Output features (the rendered HTML):**
614
+
615
+ - Sticky top bar: doc type + feature name + version + last-updated + counts (specs / stories / AS) + status pill (Active/Draft/Deprecated)
616
+ - Mandatory TL;DR card immediately after the title
617
+ - Sidebar TOC with scroll-spy + search filter, grouped by sub-spec (multi-spec) or by section (single)
618
+ - Story cards with priority badge (P0/P1/P2) + AS count badge
619
+ - AS as collapsible details (first AS of each story open by default), with Given/When/Then grid
620
+ - Constraint callouts (warning style), grouped per sub-spec for large specs
621
+ - Change Log and Snapshots collapsed by default
622
+ - Dark/light/auto theme toggle (system preference honored)
623
+ - Print stylesheet (sidebar hidden, all details expanded, page-break-aware)
624
+ - Self-contained: zero external dependencies, no CDN, opens offline
625
+
626
+ **Source remains truth:**
627
+ - `.md` is canonical. Edit `.md` via `/sp-plan`; regenerate `.html` via this skill.
628
+ - Never hand-edit the `.html`. Re-rendering is idempotent — run `/sp-spec-render` any time you want the HTML to catch up with the `.md`.
629
+
630
+ **Token cost:** 3–8k (template + components cached; output ≈ source markdown × 1.2 — no CSS/JS in output token stream).
631
+
632
+ ### /sp-md-render — Render Any Markdown as HTML View
633
+
634
+ Generic counterpart to `/sp-spec-render`. Same template/component architecture, but for arbitrary long-form markdown with no fixed schema — investigation reports, explore docs, RFCs, retros, design notes, READMEs.
635
+
636
+ **Usage:**
637
+ ```
638
+ /sp-md-render docs/investigate/payment-bug-2026-05-16.md # render next to source
639
+ /sp-md-render <file.md> --out report.html # custom output path
640
+ /sp-md-render docs/notes/ # list + prompt
641
+ /sp-md-render # prompt for path
642
+ ```
643
+
644
+ **When to use:** Any non-spec markdown you want as a scannable, shareable single HTML file. It refuses spec files (heading `### S-NNN:`) and points you to `/sp-spec-render` instead.
645
+
646
+ **How it works:** Reads source + `template.html` + `components.md`, then uses an *analyzer pattern* (not fixed parsing) — each markdown chunk is mapped to the best component: numbered actions → step cards, GFM admonitions → callouts, ` ```mermaid ` → diagrams, pros/cons → compare cards, long appendices → collapsible. Builds the buffer in-memory, writes once.
647
+
648
+ **Output features:** sidebar TOC + scroll-spy + search, anchored headings with copy-link, code blocks with copy button + language label, Mermaid diagrams (CDN), 4-variant callouts (note/tip/warn/danger), step cards, compare cards, task lists, footnotes, figure+caption, dark/light/auto theme, scroll progress bar, mobile drawer, print stylesheet. Self-contained (only Mermaid loads from CDN).
649
+
650
+ **Token cost:** 3–8k (template + components cached; output ≈ source markdown × 1.2 — no CSS/JS in output token stream).
651
+
652
+ ### /sp-challenge — Adversarial Plan Review
653
+
654
+ **Usage:**
655
+ ```
656
+ /sp-challenge docs/specs/auth/auth.md # challenge a spec
657
+ /sp-challenge "user authentication" # challenge by feature name
658
+ ```
659
+
660
+ **How it works (7 phases):**
661
+
662
+ 1. **Read & Map** — Reads the spec (including acceptance scenarios) and maps: decisions made, assumptions (stated AND implied), dependencies, scope boundaries, risk acknowledgments, story-AS consistency.
663
+ 2. **Scale Reviewers** — Assesses complexity and selects reviewers:
664
+
665
+ | Complexity | Signals | Reviewers |
666
+ |------------|---------|-----------|
667
+ | Simple | 1 spec section, <20 acceptance scenarios, no auth/data | 2 |
668
+ | Standard | Multiple sections, auth or data involved | 3 |
669
+ | Complex | Multiple integrations, concurrency, migrations, 6+ phases | 4 |
670
+
671
+ 3. **Spawn Reviewers** — Launches parallel subagents, each with an adversarial lens:
672
+
673
+ - **Security Adversary**
674
+ - OWASP Top 10
675
+ - Injection vectors
676
+ - Auth/authz bypass
677
+ - Crypto issues
678
+ - Data exposure
679
+ - Supply chain risks
680
+
681
+ - **Failure Mode Analyst** — *"Everything that can go wrong, will — simultaneously, at 3 AM, during peak traffic"*
682
+ - Partial failures
683
+ - Concurrency & race conditions
684
+ - Cascading failures
685
+ - Recovery paths
686
+ - Idempotency
687
+ - Observability gaps
688
+
689
+ - **Assumption Destroyer** — *"'It should work' is not evidence"*
690
+ - Unverified claims
691
+ - Scale assumptions
692
+ - Environment differences
693
+ - Integration contracts
694
+ - Data shape assumptions
695
+ - Timing dependencies
696
+ - Hidden dependencies
697
+
698
+ - **Scope & YAGNI Critic** — *"The best code is no code. The best feature is the one you didn't build"*
699
+ - Over-engineering
700
+ - Premature abstraction
701
+ - Missing MVP cuts
702
+ - Gold plating
703
+ - Simpler alternatives
704
+
705
+ 4. **Deduplicate & Rate** — Collects all findings, removes duplicates, rates severity using a Likelihood x Impact matrix. Caps at 15 findings: keeps all Critical, top High by specificity, notes how many Medium were dropped. Each reviewer is limited to top 7 findings.
706
+
707
+ 5. **Adjudicate** — Evaluates each finding: Accept (valid flaw, plan should change) or Reject (false positive, acceptable risk, already handled). 1-sentence rationale for each.
708
+
709
+ 6. **User Choice** — Two modes: "Apply all accepted" (fast) or "Review each" (walk through one by one).
710
+
711
+ 7. **Apply** — Surgical edits only to accepted findings. Doesn't rewrite surrounding sections.
712
+
713
+ **Finding format:** Each finding includes Title, Severity, **Confidence score** (9-10 = verified; 7-8 = strong match; 5-6 = note caveat; ≤4 = omit unless Critical), Location, Flaw description, Evidence (direct quote from the plan), step-by-step Failure scenario, and Suggested fix.
714
+
715
+ **6 non-negotiable rules:**
716
+ 1. Spawn reviewers in parallel (not sequential)
717
+ 2. Reviewers read files directly, not summarized content
718
+ 3. Be hostile — no praise, no softening
719
+ 4. Every finding must quote the plan directly as evidence
720
+ 5. Quality over quantity — 3 honest findings > 15 padded ones
721
+ 6. Skip style/formatting — substance only
722
+
723
+ **When to use:**
724
+ - After `/sp-plan`, before coding — for complex features
725
+ - Features involving auth, payments, data pipelines, multi-service integration
726
+ - NOT needed for simple CRUD, small bug fixes, or trivial features
727
+
728
+ **Token cost:** 15-30k (uses parallel subagents, doesn't bloat main context)
729
+
730
+ ### /sp-build — TDD Delivery Loop
731
+
732
+ **Usage:**
733
+ ```
734
+ /sp-build # build all changes vs base branch
735
+ /sp-build src/api/users.ts # build specific file
736
+ /sp-build "user authentication" # build specific feature
737
+ ```
738
+
739
+ **How it works:**
740
+
741
+ 1. **Phase 0: Build Context** — Finds changed files vs base branch, reads the spec (acceptance scenarios in `## Stories` section are the roadmap), checks `docs/specs/<feature>/.build-progress` to resume from a previous interrupted session, reads existing tests for patterns, fixtures, and naming conventions. Doesn't duplicate what already exists.
742
+ 2. **Phase 1: Decide What to Test** — Determines test scope from acceptance scenarios. Applies the **Completeness Principle**: AI writes tests ~50x faster than humans, so if full coverage costs `CC: ≤15m`, it writes complete tests without asking. Always checks 8 mandatory edge case categories: null/undefined, empty arrays/strings, invalid types, boundary values (min/max), error paths (network failures, DB errors), race conditions, large data (10k+ items), and special characters (Unicode, SQL chars).
743
+ 3. **Phase 1.5: Coverage Map** — Before writing a single test, traces every code path (if/else, switch, guard, try/catch) AND user flows (double-click, stale session, navigate away mid-op). Draws an ASCII diagram marking each path as `[★★★ TESTED]`, `[★★ TESTED]`, `[★ TESTED]`, or `[GAP]`. Gaps marked `[GAP] [→E2E]` need E2E tests; `[GAP] [→EVAL]` need evals — when flagged, defines capability + regression evals before implementing and reports pass@1/pass@3. **Regression rule:** if the diff changes existing behavior with no covering test, a regression test is a CRITICAL requirement — no asking, no skipping.
744
+ 4. **Phase 2: Write Tests** — Writes tests for every `[GAP]` identified in the Coverage Map. Before moving to Phase 3, verifies: all public functions have unit tests, all API endpoints have integration tests, edge cases covered, error paths tested, tests independent, assertions specific.
745
+ 5. **Phase 3: Build and Run** — Compiles/typechecks first, then runs tests.
746
+ 6. **Phase 4: Fix Loop** — If tests fail, fixes **test code only** (max 3 attempts, then hard stop and report). If tests expect X but code does Y, asks whether to fix production code or adjust the test — with effort scales `(human: ~X / CC: ~Y)`.
747
+ 7. **Phase 5: Report** — Summary with test counts, results, coverage, files touched, and any E2E/eval gaps to follow up on.
748
+
749
+ **Rules:**
750
+ - Never changes production code without asking first
751
+ - Never deletes or weakens existing tests
752
+ - Never adds `skip`/`xit`/`@disabled` to hide failures
753
+ - Max 3 fix attempts — then stops and reports the issue
754
+
755
+ **What NOT to test:** Private/internal methods, framework behavior, trivial getters/setters, implementation details.
756
+
757
+ ### /sp-investigate — Read-Only Root Cause Investigation (Optional)
758
+
759
+ **Usage:**
760
+ ```
761
+ /sp-investigate "production 500s after deploy on /api/orders"
762
+ /sp-investigate "intermittent data corruption in nightly sync"
763
+ ```
764
+
765
+ **When to use:** OPTIONAL branch before `/sp-fix`. Use for complex bugs, production outages, data corruption, unclear regressions, or when the user wants a diagnosis report without any code change. Skip for trivial/obvious bugs — go straight to `/sp-fix`.
766
+
767
+ **What it does NOT do:** Never edits source code, tests, or config. The only write it performs is the investigation report at `docs/investigate/<slug>-<date>.md`.
768
+
769
+ **How it works (adaptive depth, auto-scales):**
770
+
771
+ 1. **Phase 1: Understand the Report** — Extract symptom, expected, actual from `$ARGUMENTS`. Asks ONE clarifying question via AskUserQuestion if required fields are missing.
772
+ 2. **Phase 2: Locate** — Entry-point search (error/stack/function/feature), recurring-bug check (3+ fix commits on same pattern → architectural smell), data-flow trace, git history (regression signal).
773
+ 3. **Phase 3: Pattern Match** — 12 known bug patterns (nil propagation, race, state corruption, off-by-one, type coercion, stale cache, config drift, silent error swallow, ordering/timing, resource leak, merge conflict, API contract). Skipped if Phase 2 already produced a HIGH-confidence hypothesis.
774
+ 4. **Phase 4: Form Hypothesis** — Specific, testable, falsifiable. Location + mechanism + causal chain + disproof condition + confidence (HIGH/MEDIUM/LOW). 3-strike rule: if 3 hypotheses all stay below MEDIUM → escalate via AskUserQuestion.
775
+ 5. **Phase 5: Map Blast Radius** — Investigation scope, bug path diagram (skipped if ISOLATED), impact scope (direct/indirect/data/user-facing), similar-risk scan (5-min timebox).
776
+ 6. **Phase 6: Recommend Next Steps** — CRITICAL/HIGH/MEDIUM actions, test strategy, fix approach (minimal / targeted refactor / architectural).
777
+ 7. **Output** — Writes structured Investigation Report to `docs/investigate/<slug>-<date>.md`. Signals `/sp-fix <file>` for handoff.
778
+
779
+ **Status values:** `ROOT_CAUSE_FOUND | PROBABLE_CAUSE | INSUFFICIENT_EVIDENCE | BLOCKED`
780
+
781
+ **Iron Law:** Follow evidence, never start with a theory. Every claim references file:line or git commit. INSUFFICIENT_EVIDENCE is a valid outcome — don't inflate confidence to ship a report.
782
+
783
+ **Token cost:** 8–15k
784
+
785
+ ---
786
+
787
+ ### /sp-fix — Test-First Bug Fix
788
+
789
+ **Usage:**
790
+ ```
791
+ /sp-fix "description of the bug"
792
+ ```
793
+
794
+ **How it works:**
795
+
796
+ 1. **Phase 0: Investigate** — Parses the bug report, locates relevant code, checks git history, and forms a root cause hypothesis. Then draws a **Bug Path Diagram** (same `[GAP]`/`[★★ TESTED]` format as `/sp-build`) for the buggy function — if no specific `[GAP]` path can be identified, the hypothesis isn't specific enough yet.
797
+ 2. **Phase 1: Write Failing Test** — **Regression rule first:** if the bug exists because the diff changed existing behavior with no test covering that path, a regression test is a CRITICAL requirement. Creates a test that reproduces the bug and **MUST fail** with current code.
798
+ 3. **Phase 2: Fix** — Minimal change only. Blast radius check: if fix touches >5 files, stops and asks before editing.
799
+ 4. **Phase 3: Verify** — Bug test must pass; full suite must show no new regressions.
800
+ 5. **Phase 4: Root Cause Analysis** — Documents: Symptom, Root cause, Gap (why wasn't this caught earlier?), Prevention (one of: type constraint, validation, lint rule, spec update). Non-optional for serious bugs.
801
+ 6. **Phase 5: Report** — Structured debug report with hypothesis, fix, evidence, and regression test reference.
802
+
803
+ **Multiple bugs:** Triages by severity, fixes one at a time, commits each separately.
804
+
805
+ ### /sp-review — Pre-Merge Quality Gate
806
+
807
+ **Usage:**
808
+ ```
809
+ /sp-review # review all changes vs base branch
810
+ /sp-review src/auth/ # review specific directory
811
+ ```
812
+
813
+ **How it works:**
814
+
815
+ 1. **Phase 0: Understand Intent** — Reads commit messages, checks for related spec, expands blast radius. Also notes **what already exists**: flags if the diff rebuilds something that already exists in the codebase.
816
+ 2. **Phase 1: Smart Focus** — Auto-detects what to focus on based on the diff (auth → security, SQL → injection, payments → idempotency, etc.). Spends 60% of analysis on the primary focus.
817
+ 3. **Phase 2: Review** — Security, correctness, **API/Backend patterns** (unvalidated input, missing rate limiting, missing timeouts, missing CORS, error message leakage), spec-test alignment, code quality (including **diagram maintenance**: stale ASCII diagrams in comments are flagged), performance, a **Failure Mode Grid** for each new codepath (3 dimensions: test covers it? error handling exists? user sees a clear error or silent failure? — all 3 missing = Critical gap), and an **AI-generated code addendum** when reviewing AI-written changes (behavioral regressions, trust boundaries, architecture drift, model cost escalation).
818
+ 4. **Phase 3: Report** — Structured report. Every finding includes a **confidence score** `(confidence: N/10)`: 9-10 = verified in code; 7-8 = strong pattern match; 5-6 = possible false positive; <5 = appendix only. Includes a **"Not in scope"** section listing deferred work with rationale.
819
+
820
+ **Proportional review:** A 5-line doc change gets a light review. A 500-line auth rewrite gets file-by-file deep analysis.
821
+
822
+ **Verdicts:** APPROVE / REQUEST CHANGES / NEEDS DISCUSSION.
823
+
824
+ **Rules:**
825
+ - At least 1 positive note — reinforces good patterns, not just problems
826
+ - Never auto-fixes code — report only
827
+ - Checks spec-test alignment: code changed → spec/acceptance scenarios/tests also changed?
828
+
829
+ ### /sp-commit — Smart Git Commit
830
+
831
+ **Usage:**
832
+ ```
833
+ /sp-commit
834
+ ```
835
+
836
+ **How it works:**
837
+
838
+ 1. **Analyze** — Scans `git status`, diff stats, and file contents in one pass.
839
+ 2. **Scan for secrets** — Matches patterns: `api_key`, `token`, `password`, `secret`, `private_key`, `credential`, `auth_token`. **Hard block** — stops immediately if found, non-negotiable.
840
+ 3. **Scan for debug code** — Matches: `console.log`, `debugger`, `print()`, `TODO:remove`, `HACK:`, `FIXME:temp`, `binding.pry`, `var_dump`. **Soft warn** — proceeds if you confirm.
841
+ 4. **Stage files** — Stages specific files by name. Never uses `git add -A`.
842
+ 5. **Generate message** — Conventional format: `type(scope): description`. Imperative tense ("add" not "added"), no period, WHAT+WHY not HOW.
843
+ 6. **Commit** — Does NOT push (safe default). Ask Claude explicitly to push.
844
+
845
+ **Large diff warning:** If >10 files OR >300 lines changed, suggests splitting into smaller commits for easier review.
846
+
847
+ **Never stages:** `.env`, credentials, build artifacts, generated files, binaries >1MB.
848
+
849
+ **Breaking changes:** If the diff removes/renames a public function, export, or API endpoint, uses `feat!` or `fix!` type, or adds a `BREAKING CHANGE:` footer.
850
+
851
+ ### /sp-voices — Multi-LLM Review (Optional)
852
+
853
+ **Usage:**
854
+ ```
855
+ /sp-voices # review current diff with multi-LLM panel
856
+ /sp-voices docs/specs/auth/auth.md # review a spec
857
+ /sp-voices src/payment/ # review specific files
858
+ ```
859
+
860
+ **When to use:** Optional second opinion *after* `/sp-review` for high-stakes changes (auth, payment, data pipelines), when `/sp-review` returns mixed-confidence findings (most at 5–7), or any time you want cross-model verification before merge. Skip for routine refactors and small CRUD.
861
+
862
+ **How it works:**
863
+
864
+ 1. **Detect available LLMs** — Checks for OpenAI / Codex CLI / Gemini / Perplexity / Anthropic API / Ollama in priority order. Falls back to a self-spawned Claude sub-agent if no external LLM is available, with the limitation flagged in the report.
865
+ 2. **Construct open-ended review prompts** — Same material to every voice with a light bias nudge (correctness / security / design). No structured templates, no severity scale forced on reviewers — they think freely; *we* structure the synthesis.
866
+ 3. **Call voices in parallel** — 2–3 voices typically; temperature 0.3; graceful degradation if any voice fails.
867
+ 4. **Synthesize** — Parses free-form responses into findings, classifies severity/category ourselves, identifies CONSENSUS (2+ voices agree → REINFORCED), UNIQUE findings (single voice → flag for verification), and DISAGREEMENTS (voices contradict → present both sides; tiebreaker for HIGH+).
868
+ 5. **Output report** — Critical/High findings, disagreements, voice breakdown table, agreement rate (100% may indicate shared blind spot), blind spots (categories with 0 findings).
869
+
870
+ **Decision points** (all use `AskUserQuestion`): review type ambiguous, voice panel size for large reviews, voice unavailable, critical consensus finding, disagreement resolution, follow-up cost > $0.10, report destination.
871
+
872
+ **Rules:** Same material different lenses. Don't resolve disagreements — present both sides, human decides. Consensus ≠ correct (flag if agreement rate is 100%). Findings must be specific (`auth.ts:47` not "code could be improved").
873
+
874
+ **Token cost:** 10–30k host + external API cost (Budget: ~$0.01–0.05; Standard: ~$0.05–0.20; Premium: ~$0.20–0.50 per review).
875
+
876
+ ---
877
+
878
+ ### /sp-humanize — Rephrase to Human Voice
879
+
880
+ **Usage:**
881
+ ```
882
+ /sp-humanize <paste plan/notes/draft> # infer format + audience from context
883
+ /sp-humanize reply jira <notes> # target a specific format
884
+ /sp-humanize draft a customer email <notes> # switch audience, hide implementation
885
+ ```
886
+
887
+ **When to use:** You have a plan, bullet notes, or AI-generated draft and want it rewritten into natural, send-ready text — a PR description, release note, slack announcement, postmortem, customer reply, LinkedIn post, or plain email. Not part of the spec-first dev cycle. Skip for pure translation, summarization, or generating content from zero.
888
+
889
+ **How it works:**
890
+
891
+ 1. **Infer target format** — From explicit instruction → session context → input shape → fallback to tight plain text. No fixed whitelist; uncommon or hybrid formats follow their own conventions.
892
+ 2. **Infer audience** — Engineering, customer, executive, public, or mixed. Same content, phrasing shifts by reader (technical terms for engineers, outcome-focused for customers).
893
+ 3. **Preserve facts** — Numbers, names, error codes, file paths, commands, URLs, commitments, and decisions are never paraphrased. Certainty is never softened ("will ship Monday" ≠ "hope to ship Monday").
894
+ 4. **Strip AI tone** — Removes em-dash overuse, banned buzzwords (EN + VI), hollow openings/closings, fake enthusiasm, and "rule of three" pile-ups. Varies sentence rhythm.
895
+ 5. **Return send-ready text** — The final version directly, no preamble, no explanation of edits.
896
+
897
+ **Language:** Follows the session's dominant language. Mixed Vietnamese-English is fine — technical terms stay untranslated.
898
+
899
+ **Token cost:** 2–6k, no external API.
900
+
901
+ ---
902
+
903
+ ## 6. Automatic Guards (Hooks)
904
+
905
+ Hooks run automatically — you don't invoke them. They provide passive protection.
906
+
907
+ ### File Guard (`file-guard.js`)
908
+
909
+ **Trigger:** After every Write or Edit operation.
910
+ **Action:** If a modified **source code file** exceeds 350 lines, injects a warning suggesting modularization. Docs, configs, and templates are intentionally excluded — they are naturally long.
911
+ **Blocking:** No — warns only, does not prevent the edit.
912
+
913
+ **Checked extensions:** `.ts`, `.tsx`, `.js`, `.jsx`, `.py`, `.php`, `.rb`, `.rs`, `.go`, `.swift`, `.kt`, `.java`, `.cs`, `.cpp`, `.c`, `.dart`, `.vue`, `.svelte`, `.astro`, and more.
914
+ **Not checked:** `.md`, `.json`, `.yaml`, `.toml`, `.html`, `.css`, `.sh`, and other non-source files.
915
+
916
+ **Configuration:**
917
+ ```bash
918
+ # Change the line threshold (default: 350)
919
+ export FILE_GUARD_THRESHOLD=500
920
+
921
+ # Exclude files from checking (comma-separated globs)
922
+ export FILE_GUARD_EXCLUDE="*.generated.swift,*.pb.go,*.min.js"
923
+ ```
924
+
925
+ ### Path Guard (`path-guard.sh`)
926
+
927
+ **Trigger:** Before every Bash command.
928
+ **Action:** Blocks commands that reference large directories (node_modules, build artifacts, etc.).
929
+ **Blocking:** Yes — prevents the command from running.
930
+
931
+ **Default blocked paths:**
932
+ `node_modules`, `__pycache__`, `.git/objects`, `dist/`, `build/`, `.next/`, `vendor/`, `Pods/`, `.build/`, `DerivedData/`, `.gradle/`, `target/debug`, `target/release`, `.nuget`, `.cache`
933
+
934
+ **Configuration:**
935
+ ```bash
936
+ # Add project-specific blocked paths (pipe-separated)
937
+ export PATH_GUARD_EXTRA="\.terraform|\.vagrant|\.docker"
938
+ ```
939
+
940
+ ### Glob Guard (`glob-guard.js`)
941
+
942
+ **Trigger:** Before every Glob (file search) operation.
943
+ **Action:** Blocks overly broad glob patterns at project root that would return thousands of files and fill the context window.
944
+ **Blocking:** Yes — prevents the glob and suggests scoped alternatives.
945
+
946
+ **What it blocks:**
947
+ - `**/*.ts` at project root (use `src/**/*.ts` instead)
948
+ - `**/*` at project root (use `src/**/*` instead)
949
+ - `*` or `**` at project root
950
+ - Any recursive glob without a specific directory prefix
951
+
952
+ **What it allows:**
953
+ - `src/**/*.ts` — scoped to a specific directory
954
+ - `tests/**/*.test.js` — scoped to tests
955
+ - `**/*.ts` when run from inside a scoped directory (e.g., `path: "src"`)
956
+
957
+ ### Comment Guard (`comment-guard.js`)
958
+
959
+ **Trigger:** After every Edit operation.
960
+ **Action:** Detects when real code is replaced with placeholder comments like `// ... existing code ...` or `// rest of implementation`. This is a common LLM laziness pattern.
961
+ **Blocking:** Yes — rejects the edit and tells Claude to preserve the original code.
962
+
963
+ **What it catches:**
964
+ - `// ... existing code ...`, `// ... rest of implementation`
965
+ - `// [previous code remains]`, `// unchanged`
966
+ - `/* ... */` replacing real code
967
+ - `# ... existing ...` (Python placeholders)
968
+ - `// TODO: implement` replacing real code
969
+ - Any edit where real code is replaced with a much shorter comment-only block
970
+
971
+ **What it allows:**
972
+ - Editing comments (old content was already comments)
973
+ - Adding comments alongside code (new content has both)
974
+ - Normal code replacements
975
+
976
+ ### Sensitive Guard (`sensitive-guard.sh`)
977
+
978
+ **Trigger:** Before every Read, Write, Edit, and Bash command.
979
+ **Action:** Protects files containing secrets: `.env`, private keys, credentials, tokens.
980
+ **Blocking:** Read/Write/Edit → **blocks** (exit 2). Bash commands → **warns only** (allows access).
981
+
982
+ The Bash warn-only behavior enables an approval flow: Claude asks the user for permission, and if approved, can use `bash cat .env` to read the file.
983
+
984
+ **Protected files:**
985
+ - `.env`, `.env.local`, `.env.production`, etc. (but NOT `.env.example`)
986
+ - Private keys: `*.pem`, `*.key`, `*.p12`, `*.pfx`, `*.jks`
987
+ - SSH keys: `id_rsa`, `id_ecdsa`, `id_ed25519`
988
+ - Cloud credentials: `serviceAccountKey.json`, `firebase-adminsdk*`
989
+ - Token files: `.npmrc`, `.pypirc`, `.netrc`
990
+ - Any file matching `*credential*`, `*secret*`, `*private_key*`
991
+
992
+ **Supports `.agentignore`:** Create a `.agentignore` file (or `.aiignore`, `.cursorignore`) in the project root with gitignore-style patterns to add project-specific protections.
993
+
994
+ **Configuration:**
995
+ ```bash
996
+ # Add extra patterns (pipe-separated regex)
997
+ export SENSITIVE_GUARD_EXTRA="\.vault|.*_token\.json"
998
+ ```
999
+
1000
+ ### Self-Review (`self-review.sh`)
1001
+
1002
+ **Trigger:** When Claude is about to stop (Stop event).
1003
+ **Action:** Injects a self-review checklist reminding Claude to verify quality before finishing.
1004
+ **Blocking:** No — just a reminder.
1005
+
1006
+ **Questions asked:**
1007
+ 1. Did you leave any TODO/FIXME that should be resolved now?
1008
+ 2. Did you create mock/fake implementations just to pass tests?
1009
+ 3. Did you replace real code with placeholder comments?
1010
+ 4. Do all changed files compile and typecheck cleanly?
1011
+ 5. Did you run the full test suite, not just the new tests?
1012
+ 6. Are there any files you modified but forgot to include in the summary?
1013
+
1014
+ **Configuration:**
1015
+ ```bash
1016
+ # Disable self-review
1017
+ export SELF_REVIEW_ENABLED=false
1018
+ ```
1019
+
1020
+ ### Testing Hooks Manually
1021
+
1022
+ You can test hooks by piping mock JSON payloads:
1023
+
1024
+ ```bash
1025
+ # ── Path Guard ──
1026
+ # Should exit 2 (blocked)
1027
+ echo '{"tool_input":{"command":"ls node_modules"}}' | bash .claude/hooks/path-guard.sh
1028
+ echo $? # expect: 2
1029
+
1030
+ # Should exit 0 (allowed)
1031
+ echo '{"tool_input":{"command":"ls src"}}' | bash .claude/hooks/path-guard.sh
1032
+ echo $? # expect: 0
1033
+
1034
+ # ── File Guard ──
1035
+ seq 1 250 > /tmp/test-large.txt
1036
+ echo '{"tool_input":{"file_path":"/tmp/test-large.txt"}}' | node .claude/hooks/file-guard.js
1037
+ # Should output JSON with additionalContext warning
1038
+
1039
+ # ── Comment Guard ──
1040
+ # Should exit 2 (blocked — replacing code with placeholder)
1041
+ echo '{"tool_input":{"old_string":"function hello() {\n return world;\n}","new_string":"// ... existing code ..."}}' | node .claude/hooks/comment-guard.js
1042
+ echo $? # expect: 2
1043
+
1044
+ # Should exit 0 (allowed — replacing code with code)
1045
+ echo '{"tool_input":{"old_string":"return a;","new_string":"return b;"}}' | node .claude/hooks/comment-guard.js
1046
+ echo $? # expect: 0
1047
+
1048
+ # ── Sensitive Guard ──
1049
+ # Should exit 2 (blocked)
1050
+ echo '{"tool_input":{"file_path":".env"}}' | bash .claude/hooks/sensitive-guard.sh
1051
+ echo $? # expect: 2
1052
+
1053
+ # Should exit 0 (allowed)
1054
+ echo '{"tool_input":{"file_path":".env.example"}}' | bash .claude/hooks/sensitive-guard.sh
1055
+ echo $? # expect: 0
1056
+
1057
+ # Should exit 0 (warn only — bash commands are allowed for approved access)
1058
+ echo '{"tool_input":{"command":"cat .env.local"}}' | bash .claude/hooks/sensitive-guard.sh
1059
+ echo $? # expect: 0 (with warning on stderr)
1060
+
1061
+ # ── Glob Guard ──
1062
+ # Should exit 2 (blocked — broad pattern at root)
1063
+ echo '{"tool_input":{"pattern":"**/*.ts"}}' | node .claude/hooks/glob-guard.js
1064
+ echo $? # expect: 2
1065
+
1066
+ # Should exit 0 (allowed — scoped pattern)
1067
+ echo '{"tool_input":{"pattern":"src/**/*.ts"}}' | node .claude/hooks/glob-guard.js
1068
+ echo $? # expect: 0
1069
+ ```
1070
+
1071
+ ---
1072
+
1073
+ ## 7. Spec Format
1074
+
1075
+ ### Spec Template
1076
+
1077
+ Create specs at `docs/specs/<feature>/<feature>.md`:
1078
+
1079
+ ```markdown
1080
+ # Spec: <Feature Name>
1081
+
1082
+ **Created:** 2026-04-02
1083
+ **Last updated:** 2026-04-02
1084
+ **Status:** Draft | Active | Deprecated
1085
+
1086
+ ## Overview
1087
+ What this feature does, why it exists, who uses it. 2-3 sentences.
1088
+
1089
+ ## Data Model
1090
+ Entities, attributes, relationships (if applicable).
1091
+
1092
+ ## Stories
1093
+
1094
+ ### S-001: <Story name> (P0)
1095
+
1096
+ **Description:** [user story]
1097
+ **Source:** [optional: ticket/issue ref]
1098
+
1099
+ **Acceptance Scenarios:**
1100
+
1101
+ AS-001: <short description>
1102
+ - **Given:** [state]
1103
+ - **When:** [action]
1104
+ - **Then:** [expected]
1105
+ - **Data:** [test data]
1106
+
1107
+ AS-002: <short description>
1108
+ - **Given:** [error state]
1109
+ - **When:** [action]
1110
+ - **Then:** [error handling]
1111
+
1112
+ ### S-002: <Story name> (P1)
1113
+
1114
+ AS-003: <short description>
1115
+ - **Given:** [state]
1116
+ - **When:** [action]
1117
+ - **Then:** [expected]
1118
+
1119
+ ### S-003: <Story name> (P2)
1120
+
1121
+ AS-004: <short description>
1122
+ - [flow description + expected behavior]
1123
+
1124
+ ## Constraints & Invariants
1125
+ Rules that must always hold.
1126
+
1127
+ ## Change Log
1128
+
1129
+ | Date | Change | Ref |
1130
+ |------|--------|-----|
1131
+ | 2026-04-02 | Initial creation | -- |
1132
+ ```
1133
+
1134
+ Skip sections that don't apply. Match depth to feature complexity.
1135
+
1136
+ **Acceptance Scenario depth by priority:**
1137
+ - **P0:** Full Given + When + Then + Data + Setup. At least 1 happy path + 1 error path.
1138
+ - **P1:** Given + When + Then. At least 1 happy path.
1139
+ - **P2:** 1-2 line flow description. At least 1 scenario.
1140
+
1141
+ ### Snapshots (Version History)
1142
+
1143
+ When `/sp-plan` Mode C detects a Major change (new story, removed story, priority change, flow change, behavior change for P0, or constraint change), it automatically creates a snapshot before updating:
1144
+
1145
+ ```
1146
+ docs/specs/<feature>/snapshots/
1147
+ 2026-04-02.md ← full copy at that point in time
1148
+ 2026-04-05-BILL-101.md ← with ticket reference
1149
+ ```
1150
+
1151
+ Snapshots are immutable, managed by sp-plan (not developers), and capped at 5 most recent.
1152
+
1153
+ ### Naming Conventions
1154
+ | Item | Convention | Example |
1155
+ |------|-----------|---------|
1156
+ | Spec directory | `docs/specs/<feature>/` | `docs/specs/user-auth/` |
1157
+ | Spec file | `<feature>.md` in feature directory | `user-auth.md` |
1158
+ | Story ID | `S-NNN` sequential per spec | `S-001`, `S-005` |
1159
+ | Scenario ID | `AS-NNN` sequential across all stories | `AS-001`, `AS-042` |
1160
+ | Priority | `P0` (critical), `P1` (important), `P2` (nice-to-have) — per story | — |
1161
+ | Snapshot | `YYYY-MM-DD.md` or `YYYY-MM-DD-<REF>.md` in `snapshots/` | `2026-04-02.md` |
1162
+
1163
+ ---
1164
+
1165
+ ## 8. Customization
1166
+
1167
+ ### Environment Variables
1168
+
1169
+ | Variable | Default | Description |
1170
+ |----------|---------|-------------|
1171
+ | `FILE_GUARD_THRESHOLD` | `200` | Max lines before file guard warns |
1172
+ | `FILE_GUARD_EXCLUDE` | _(empty)_ | Comma-separated globs to skip (e.g. `*.generated.swift`) |
1173
+ | `PATH_GUARD_EXTRA` | _(empty)_ | Additional pipe-separated patterns to block (e.g. `\.terraform`) |
1174
+ | `SENSITIVE_GUARD_EXTRA` | _(empty)_ | Additional pipe-separated patterns for sensitive files (e.g. `\.vault`) |
1175
+ | `SELF_REVIEW_ENABLED` | `true` | Set to `false` to disable the self-review checklist on Stop |
1176
+
1177
+ Set these in your shell profile or project `.envrc` (if using direnv).
1178
+
1179
+ ### Extending CLAUDE.md
1180
+
1181
+ Add project-specific rules to `.claude/CLAUDE.md`:
1182
+
1183
+ ```markdown
1184
+ ## Project-Specific Rules
1185
+
1186
+ - All API endpoints must have OpenAPI annotations
1187
+ - Database migrations must be reversible
1188
+ - UI components must support dark mode
1189
+ - All strings must be localized via i18n keys
1190
+ ```
1191
+
1192
+ ### Adding Custom Skills
1193
+
1194
+ Create new skills in `.claude/skills/<name>/SKILL.md`:
1195
+
1196
+ ```markdown
1197
+ # .claude/skills/deploy/SKILL.md
1198
+
1199
+ Run the deployment pipeline:
1200
+ 1. /sp-review
1201
+ 2. /sp-commit
1202
+ 3. Run: bash scripts/deploy.sh $ARGUMENTS
1203
+ 4. Verify deployment health: curl -f https://api.example.com/health
1204
+ ```
1205
+
1206
+ Then use: `/deploy staging`
1207
+
1208
+ ---
1209
+
1210
+ ## 9. Token Cost Guide
1211
+
1212
+ | Activity | Tokens | Frequency |
1213
+ |----------|--------|-----------|
1214
+ | `/sp-scaffold` (greenfield bootstrap) | 15–40k + install/build time | Once per new project, before the first spec |
1215
+ | `/sp-build` (incremental, 1-3 files) | 5–10k | Every code chunk |
1216
+ | `/sp-investigate` (complex bug) | 8–15k | OPTIONAL before /sp-fix — complex/outage only |
1217
+ | `/sp-fix` (single bug) | 3–5k | As needed |
1218
+ | `/sp-commit` | 2–4k | Every commit |
1219
+ | `/sp-review` (diff-based) | 10–20k | Before merge |
1220
+ | `/sp-plan` (new feature) | 20–40k | Start of feature |
1221
+ | `/sp-challenge` (adversarial review) | 15–30k | After /sp-plan, complex features |
1222
+ | `/sp-spec-render` (HTML view) | 3–8k | User-invoked after /sp-plan when HTML view wanted, or to refresh stale `.html` |
1223
+ | `/sp-md-render` (HTML view, any md) | 3–8k | User-invoked for non-spec markdown — investigation, explore, RFC, retro, README |
1224
+ | `/sp-voices` (multi-LLM review) | 10–30k + external API cost (~$0.01–0.50) | Optional — after /sp-review for high-stakes changes |
1225
+ | Full audit (manual prompt) | 100k+ | Before release |
1226
+
1227
+ ### Minimizing Token Usage
1228
+
1229
+ - **Test incrementally.** `/sp-build` after each small chunk uses 5-10k. Waiting until everything is done then running `/sp-build` on a large diff uses 50k+.
1230
+ - **Use filters.** `/sp-build src/auth/login.ts` is cheaper than `/sp-build` on the whole project.
1231
+ - **Skip `/sp-plan` for tiny changes.** Under 5 lines with no behavior change? Just `/sp-build` and `/sp-commit`.
1232
+ - **Use `/sp-review` only before merge.** Not after every commit.
1233
+
1234
+ ---
1235
+
1236
+ ## 10. Troubleshooting
1237
+
1238
+ ### Hook not firing
1239
+
1240
+ **Symptom:** File guard or path guard doesn't trigger.
1241
+
1242
+ **Check:**
1243
+ 1. Is `settings.json` valid? `node -e "JSON.parse(require('fs').readFileSync('.claude/settings.json','utf-8'))"`
1244
+ 2. Are hooks executable? `ls -la .claude/hooks/`
1245
+ 3. Is Node.js available? `node --version`
1246
+ 4. Is `$CLAUDE_PROJECT_DIR` set? Check in Claude Code with: `echo $CLAUDE_PROJECT_DIR`
1247
+
1248
+ ### Tests not detected
1249
+
1250
+ **Symptom:** `/sp-build` or `/sp-fix` can't figure out how to run the tests.
1251
+
1252
+ **Check:**
1253
+ 1. Are you in the project root? `pwd`
1254
+ 2. Does the project marker file exist? (e.g., `package.json`, `Cargo.toml`, `pyproject.toml`)
1255
+ 3. If your test command is non-standard, set it explicitly in `.claude/CLAUDE.md` under **Testing** so the skills use it.
1256
+
1257
+ ### Wrong base branch
1258
+
1259
+ **Symptom:** `/sp-build` or `/sp-review` compares against wrong branch.
1260
+
1261
+ **Check:**
1262
+ ```bash
1263
+ git symbolic-ref refs/remotes/origin/HEAD
1264
+ ```
1265
+
1266
+ If this is wrong or missing:
1267
+ ```bash
1268
+ git remote set-head origin <your-main-branch>
1269
+ ```
1270
+
1271
+ ### Path guard blocking a legitimate command
1272
+
1273
+ **Symptom:** Claude can't run a command you need.
1274
+
1275
+ **Fix:** The path guard blocks broad patterns. If you need to access `build/` for a specific reason, run the command directly in your terminal (not through Claude Code).
1276
+
1277
+ ### File guard warning on generated files
1278
+
1279
+ **Fix:** Set the exclude pattern:
1280
+ ```bash
1281
+ export FILE_GUARD_EXCLUDE="*.generated.swift,*.pb.go,*.min.js,*.snap"
1282
+ ```
1283
+
1284
+ ---
1285
+
1286
+ ## 11. FAQ
1287
+
1288
+ **Q: Do I need specs for every tiny change?**
1289
+ A: No. Changes under 5 lines with no behavior change can skip the spec. Just `/sp-build` and `/sp-commit`. The spec-first rule is for meaningful behavior changes.
1290
+
1291
+ **Q: Can I use mocks in tests?**
1292
+ A: Only for external services you can't run locally (third-party APIs, email services). Never mock your own code or database just to make tests pass faster.
1293
+
1294
+ **Q: What if Claude writes a test that tests the wrong thing?**
1295
+ A: This usually means the spec is ambiguous. Clarify the spec first, then re-run `/sp-build`. Good specs produce good tests.
1296
+
1297
+ **Q: Can I use this with other AI coding tools?**
1298
+ A: Yes. `specpipe init --agents <list>|all` installs the skills for Codex, Cursor, Antigravity, OpenClaw, and Hermes, each in its native format. Guards are hook-*enforced* for Claude, Codex, and Cursor (`.codex/hooks.json` / `.cursor/hooks.json` can block tool calls); Antigravity, OpenClaw, and Hermes get them as always-on advisory rules. The specs and workflow are tool-agnostic. See [docs/multi-agent.md](docs/multi-agent.md).
1299
+
1300
+ **Q: When should I use `/sp-challenge`?**
1301
+ A: After `/sp-plan`, for complex features involving authentication, payments, data pipelines, or multi-service integration. It spawns parallel hostile reviewers that find security holes, failure modes, and false assumptions BEFORE you write code. Skip it for simple CRUD or small features — the overhead isn't worth it.
1302
+
1303
+ **Q: How do I do a full coverage audit?**
1304
+ A: This is intentionally not a command (it's expensive and rare). When needed, prompt Claude directly: "Audit test coverage for feature X against docs/specs/X/X.md acceptance scenarios. Identify gaps and write missing tests."
1305
+
1306
+ **Q: What if my project uses multiple languages?**
1307
+ A: The skills auto-detect the test command from the first project marker they find. For monorepos, run `/sp-build` from each sub-project directory, or pin the test command per project in `.claude/CLAUDE.md` under **Testing**.
1308
+
1309
+ **Q: Can I add more skills?**
1310
+ A: Yes. Create a directory `.claude/skills/<name>/SKILL.md` and it becomes available as a slash command. See [Customization](#8-customization).
1311
+
1312
+ **Q: How do I update the kit in existing projects?**
1313
+ A: Run `npx specpipe upgrade`. It automatically detects which files you've customized and only updates unchanged files. Use `--force` to overwrite everything.
1314
+
1315
+ **Q: What's the HTML view next to my spec, and how do I generate it?**
1316
+ A: It's a scannable view of the spec — sidebar TOC, story cards, collapsible AS, dark/light theme. Reading a 1000-line spec markdown in an editor is painful; the HTML is what a tired human can actually skim. Generate or refresh it by running `/sp-spec-render <feature>` — `/sp-plan` does not create it automatically, it just suggests the command at the end. `.md` remains the source of truth (AI and `/sp-build` read it, git diffs work normally). `.html` is a regenerable artifact — never edit it by hand, let `/sp-spec-render` rebuild it. You can email/Slack the HTML to PMs/stakeholders who don't want to clone the repo.
1317
+
1318
+ **Q: I installed with the old setup.sh — how do I migrate?**
1319
+ A: Run `npx specpipe init --adopt .` to generate a manifest from your existing files without overwriting anything. Future upgrades will then work normally.